Remove _supports_static_cache = True for some model classes #34975

ydshieh · 2024-11-27T15:34:36Z

What does this PR do?

Remove _supports_static_cache = True for some model classes. See the comments in changes.

They were True before because it is set simply we can use static cache without torch.compile. But after #34247, static is kind tied to torch.compile and we should say it works if it works with torch.compile

ydshieh · 2024-11-27T15:36:37Z

src/transformers/models/granitemoe/modeling_granitemoe.py

@@ -330,6 +330,8 @@ def forward(self, hidden_states):
        )  # [num_tokens, num_experts]
        gates = zeros.scatter(1, top_k_indices, 1)  # [num_tokens, num_experts]
        expert_size = gates.long().sum(0)  # [num_experts,]
+        # (This cause torch.compile to fail with `torch._dynamo.exc.Unsupported: Backend compiler failed with a fake tensor exception at`)
+        # (and `DataDependentOutputException`)
        expert_size = expert_size.tolist()


jimba has this line expert_size = expert_size.tolist() too and it has no _supports_static_cache = True. Let do the same for this model.

src/transformers/models/granitemoe/modeling_granitemoe.py

ydshieh · 2024-11-27T15:40:55Z

src/transformers/models/idefics/modeling_idefics.py

@@ -1155,7 +1156,7 @@ def forward(
        elif position_ids is None:
            position_ids = cache_position.unsqueeze(0)

-        if (pixel_values, image_encoder_embeddings, perceiver_embeddings).count(None) != 2:


this will fail torch compile with another different type error.

HuggingFaceDocBuilderDev · 2024-11-27T16:12:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ydshieh · 2025-01-09T09:27:50Z

kindly ping @ArthurZucker

ArthurZucker

very nice PR, thanks for trying to fix these !

src/transformers/models/granitemoe/modeling_granitemoe.py

ArthurZucker · 2025-01-16T15:46:33Z

src/transformers/models/idefics/modeling_idefics.py

@@ -868,6 +868,8 @@ def forward(
        )
        hidden_states = nn.functional.dropout(hidden_states, p=self.config, training=self.training)
        # Fill in zeros for cross_attention hidden_states of tokens attending to no images
+        # (This cause torch.compile to fail with `torch._dynamo.exc.Unsupported: dynamic shape operator: aten.nonzero.default`)
+        # (set torch._dynamo.config.capture_dynamic_output_shape_ops = True may help but not tested)
        hidden_states[cross_attention_gate == 0] = hidden_states[cross_attention_gate == 0].fill_(0)


Mmm I am wondering if using torch.masked_fill would be better here and would avoid the graph break?

Using

hidden_states.masked_fill((cross_attention_gate == 0)[:, :, None], 0.0)

seems avoid the failure here, but I got another failure

E torch._dynamo.exc.UserError: Dynamic control flow is not supported at the moment. Please use functorch.experimental.control_flow.cond to explicitly capture the control flow. For more information about this error, see: https://pytorch.org/docs/main/generated/exportdb/index.html#cond-operands E E from user code: E File "/transformers/src/transformers/models/idefics/modeling_idefics.py", line 1609, in forward E outputs = self.model( E File "/transformers/src/transformers/models/idefics/modeling_idefics.py", line 1315, in forward E layer_outputs = vblock( E File "/transformers/src/transformers/models/idefics/modeling_idefics.py", line 1277, in vblock E layer_outputs = main_block( E File "/transformers/src/transformers/models/idefics/modeling_idefics.py", line 719, in forward E hidden_states, self_attn_weights, present_key_value = self.self_attn( E File "/transformers/src/transformers/models/idefics/modeling_idefics.py", line 611, in forward E cos, sin = self.rotary_emb(value_states, seq_len=max(kv_seq_len, q_len)) E File "/transformers/src/transformers/models/idefics/modeling_idefics.py", line 431, in forward E if seq_len > self.max_seq_len_cached:

I can find if seq_len > self.max_seq_len_cached: in many modeling files, like llama, but they are used in

if "dynamic" in self.rope_type: self._dynamic_frequency_update(position_ids, device=x.device)

which is not something in idefics.

Anyway, I will update the line to use mask_fill

@ArthurZucker Updated with your suggestion, thanks a lot! Just as mentioned above, couldn't compile with other errors 😢

ArthurZucker

Thanks for iterating!

ArthurZucker · 2025-01-27T14:36:20Z

src/transformers/models/granitemoe/modeling_granitemoe.py

+        # (This cause torch.compile to fail with `torch._dynamo.exc.Unsupported: Backend compiler failed with a fake tensor exception at`)
+        # (and `DataDependentOutputException`)


yes, to list is obviously wrong!

ArthurZucker · 2025-01-27T14:36:35Z

src/transformers/models/idefics/modeling_idefics.py

@@ -868,7 +868,7 @@ def forward(
        )
        hidden_states = nn.functional.dropout(hidden_states, p=self.config, training=self.training)
        # Fill in zeros for cross_attention hidden_states of tokens attending to no images
-        hidden_states[cross_attention_gate == 0] = hidden_states[cross_attention_gate == 0].fill_(0)
+        hidden_states = hidden_states.masked_fill((cross_attention_gate == 0)[:, :, None], 0.0)


ydshieh · 2025-01-27T15:25:01Z

run-slow: idefics

github-actions · 2025-01-27T15:26:21Z

This comment contains run-slow, running the specified jobs: ['models/idefics'] ...

ydshieh · 2025-01-28T09:41:49Z

Failing tests are irrelevant to this PR and already failing on main. Merge PR now.

…gface#34975) * use mask_fill * remove comment --------- Co-authored-by: ydshieh <[email protected]>

ydshieh commented Nov 27, 2024

View reviewed changes

src/transformers/models/granitemoe/modeling_granitemoe.py Show resolved Hide resolved

ydshieh commented Nov 27, 2024

View reviewed changes

ydshieh changed the title ~~Set some~~ Remove _supports_static_cache = True for some model classes Nov 27, 2024

ydshieh requested review from ArthurZucker November 27, 2024 15:48

ArthurZucker reviewed Jan 16, 2025

View reviewed changes

ArthurZucker removed their request for review January 16, 2025 15:51

ydshieh added 9 commits January 23, 2025 16:42

try 1

293fbc9

try 1

b6a7f37

try 1

5e64c30

try 1

53ac652

try 1

4d2b255

try 1

fa55f87

try 1

903dc6b

try 1

774906f

try 1

96ed7b3

ydshieh force-pushed the fix_compile_3 branch from b371e98 to 96ed7b3 Compare January 23, 2025 15:42

use mask_fill

4c41882

ydshieh requested a review from ArthurZucker January 23, 2025 17:04

remove comment

e037229

ArthurZucker approved these changes Jan 27, 2025

View reviewed changes

ydshieh merged commit bf16a18 into main Jan 28, 2025
17 of 18 checks passed

ydshieh deleted the fix_compile_3 branch January 28, 2025 09:42

ydshieh mentioned this pull request Jan 31, 2025

[Moshi] disable automatic compilation if the model can't compile #35992

Merged

bursteratom pushed a commit to bursteratom/transformers that referenced this pull request Jan 31, 2025

Remove _supports_static_cache = True for some model classes (huggin…

a6d58a2

…gface#34975) * use mask_fill * remove comment --------- Co-authored-by: ydshieh <[email protected]>

elvircrn pushed a commit to elvircrn/transformers that referenced this pull request Feb 13, 2025

Remove _supports_static_cache = True for some model classes (huggin…

8869610

…gface#34975) * use mask_fill * remove comment --------- Co-authored-by: ydshieh <[email protected]>

sbucaille pushed a commit to sbucaille/transformers that referenced this pull request Feb 16, 2025

Remove _supports_static_cache = True for some model classes (huggin…

6caf25b

…gface#34975) * use mask_fill * remove comment --------- Co-authored-by: ydshieh <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove _supports_static_cache = True for some model classes #34975

Remove _supports_static_cache = True for some model classes #34975

ydshieh commented Nov 27, 2024 •

edited

Loading

ydshieh Nov 27, 2024

ydshieh Nov 27, 2024

HuggingFaceDocBuilderDev commented Nov 27, 2024

ydshieh commented Jan 9, 2025

ArthurZucker left a comment •

edited

Loading

ArthurZucker Jan 16, 2025

ydshieh Jan 23, 2025

ydshieh Jan 23, 2025

ydshieh Jan 23, 2025

ArthurZucker left a comment

ArthurZucker Jan 27, 2025

ArthurZucker Jan 27, 2025

ydshieh commented Jan 27, 2025

github-actions bot commented Jan 27, 2025

ydshieh commented Jan 28, 2025

		# (This cause torch.compile to fail with `torch._dynamo.exc.Unsupported: Backend compiler failed with a fake tensor exception at`)
		# (and `DataDependentOutputException`)

Remove _supports_static_cache = True for some model classes #34975

Remove _supports_static_cache = True for some model classes #34975

Conversation

ydshieh commented Nov 27, 2024 • edited Loading

What does this PR do?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Nov 27, 2024

ydshieh commented Jan 9, 2025

ArthurZucker left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ydshieh commented Jan 27, 2025

github-actions bot commented Jan 27, 2025

ydshieh commented Jan 28, 2025

ydshieh commented Nov 27, 2024 •

edited

Loading

ArthurZucker left a comment •

edited

Loading